Shallow Information Extraction from Medical Forum Data
نویسندگان
چکیده
We study a novel shallow information extraction problem that involves extracting sentences of a given set of topic categories from medical forum data. Given a corpus of medical forum documents, our goal is to extract two related types of sentences that describe a biomedical case (i.e., medical problem descriptions and medical treatment descriptions). Such an extraction task directly generates medical case descriptions that can be useful in many applications. We solve the problem using two popular machine learning methods Support Vector Machines (SVM) and Conditional Random Fields (CRF). We propose novel features to improve the accuracy of extraction. Experiment results show that we can obtain an accuracy of up to 75%.
منابع مشابه
A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملPatient-Centered Information Extraction for Effective Search on Healthcare Forum
Online healthcare forums are one of the major social media in Health 2.0 for patients and caregivers to share personal experience and to help each other. However, current forums do not support effective information search and thus users are unable to fully leverage the rich information in the forums. In this work, we propose patient-centered information extraction to better organize the informa...
متن کاملMaking Shallow Look Deeper: Anaphora and Comparisons in Medical Information Extraction
The paper focuses on resolving natural language issues which have been affecting performance of our system processing Polish medical data. In particular, we address phenomena such as ellipsis, anaphora, comparisons, coordination and negation occurring in mammogram reports. We propose practical data-driven solutions which allow us to improve the system’s performance.
متن کاملIIT TREC 2007 Genomics Track: Using Concept-Based Semantics in Context for Genomics Literature Passage Retrieval
For the TREC-2007 Genomics Track [1], we explore unsupervised techniques for extracting semantic information about biomedical concepts with a retrieval model for using these semantics in context to improve passage retrieval precision. Dependency grammar analysis is evaluated for boosting the rank of passages where complementary subject/object concept pairs can be identified between queries and ...
متن کاملLinguistic Processing of Texts Using Geppetto
We describe the linguistic analyzer of a prototype for Information Extraction from texts. Such analyzer uses information derived from a shallow processor to limit the computational cost of the analysis. At the same time, shallow techniques are used to collapse parse fragments when a complete parse is not possible. The linguistic analyzer has been built using GePpeTto, an environment that allows...
متن کامل